
CHAPTER III 


WHY REGRESS? 


Page 91 


The REGRESSION command of Tiny Troll affords the 
business and financial user, in particular, access to a 
very powerful tool for developing models for understanding 
past trends and for forecasting future trends. This 
chapter provides a basic introduction to multiple linear 
regression and helps you to understand and interpret the 
output of the REGRESSION command. 

We are grateful to the National Association of 
Business Economists, and especially to Mr. Donald 
L. McLagan, vice president of Data Resources, Inc., for 
permission to reprint the following article by 
Mr. McLagan, "A Non-econometrician’s Guide to 
Econometrics," which appeared in the May, 1973 edition of 
Business Economics . This article gives a cogent 
explanation of the basics of multiple linear regression 
describing most of the statistics which are output by the 
REGRESSION command. Immediately following the article, 
the few items in the Tiny Troll output which are not 
mentioned in the article will be explained. 







Page 92 


If you are interested in a more rigorous explanation 
of multiple linear regression, please refer to the books 
named in the footnote on the last page of Mr. Me lagan's 
article. 


A Non-econometrician’s 
Guide to Econometrics 


Page 93 


Donald L. MoLagan 

Vice President 
Data Resources, Inc. 


A quick reference on the concepts, tech¬ 
niques, and pitfalls for selecting regression 
equations. The author avoids being rigor¬ 
ously precise and thereby reaches behind 
the technical jargon in providing useful 
guidelines for applying regression analysis 
to business and economic forecasting. 


E CONOMETRICS is being used increasingly by 
business economists to anticipate the effects 
of economic change on their business. Techniques 
which were confined within the walls of academic- 
institutions ten years ago are now the analytic- 
allies of corporate, financial and government fore¬ 
casters Unfortunately, the transition of econo¬ 
metrics from academic to business applications has 
not been accompanied by supporting documenta¬ 
tion of terminology, expected benefits and short¬ 
comings oriented towards the practical user. 

The purpose of this article is to provide a quick 
reference to the concepts, pitfalls and techniques 
for selecting regression equations — regression 
analysis being the principal tool of econometrics. It 
is not the intent of this article to be rigorously 
precise. Rather, it attempts to reach behind the 
technical jargon to find useful guidelines for the 
application of regression to business and economic 
forecasting. 1 


See end of text for footnotes. 


SOME REGRESSION CONCEPTS 
AND TERMINOLOGY 

LEAST SQUARES (Multiple Regression) 

LEAST SQUARES is a method of developing an 
equation which relates one variable (such as a com¬ 
pany's sales) to one or more other variables which 
should explain the first (such as price, economic 
demands, competition, etc.). This method is math¬ 
ematically contrived so that the resulting combina¬ 
tion of explanatory variables produces the smallest 
error between the historic actual values and those 
estimated by the regression. 

The output of a LEAST SQUARES analysis pro¬ 
vides an equation for the variable being explained 
(the left-hand or dependent variable). In this equa¬ 
tion, the left hand variable equals the sum of the 
explanatory (the right hand or independent) vari¬ 
ables, each multiplied bv a regression ESTIMATED 
COEFFICIENT 

Example 1 is in the format of a timesharing 
computer program print-out which estimates a 
Retail Company’s Sales (SALES) as a function of 
Disposable Income (YD) and the Unemployment 
Rate (Ri') with LEAST SQUARES. The equation 
indicates that SALES equals a constant of -2.1 
plus an ESTIMATED COEFFICIENT of 14.8 times 
Disposable Income minus 173.0 times an ESTI¬ 
MATED COEFFICIENT of the Unemployment 
Rate. LEAST SQUARES analysis produces several 
statistics which indicate the success achieved with 
the estimating equation. These statistics are shown 
in the print-out and are discussed below. 

R-Bar Squared 

The R BAR SQUARED is the best known indi¬ 
cator of the success of a LEAST SQUARES fit. 
Basically, the R-BAR SQUARED measures the per¬ 
centage of the change in the left hand (or depen¬ 
dent) variable in the equation which has been ex- 


Business Economics 


38 







Page 94 


EXAMPLE 1 


ORDINARY LEAST SQUARES 

FREQUENCY ANNUAL 
INTERVAL 59: 1 TO 71: 1 
LEFT-HAND VARIABLE: SALES 


RIGHT-HAND 

VARIABLE 

CONSTANT 

YD 

RU 


ESTIMATED 

COEFFICIENT 


Resulting Regression Equation 
SALES = -2.1 + 14.8 YD-173.0 RU 



T- 

STATISTIC 
—.812885E-02 
56.1398 
-4.93043 


R BAR SQUARED: 0.9971 

DURBIN-WATSON STAT. (ADJ. FOR 0. GAPS): 1.9724 
STANDARD ERROR OF THE REGRESSION: 112.367 


DATE 

ACTUAL 

FITTED 

PLOT: * = ACTUAL; += FITTED 

59 

4036. 

4054. 

+ 

60 

4134. 

4226. 

•k 

61 

4268. 

4241. 

/ 

/ 

/ 

3 

62 

4603. 

4744. 

63 

5116. 

5018. 

64 

65 

5740. 

6390. 

5598. 

6231. 

66 

6805. 

6927. 

67 

7330. 

7428. 

vs. 

68 

8198. 

8140. 


69 

8863. 

8793. 


70 

9262. 

9355. 


71 

10006 

9996. 

s\. 


plained by changes in the explanatory variables. In 
the case of a perfect explanation, the R-BAR 
SQUARED will be 1.0. Generally, the higher the 
R-BAR SQUARED, the better the equation. 

In Example 1. 99.71% of the change in SALES 
(over the interval 1959 to 1971) has been ex¬ 
plained by a constant and changes in Disposable 
Income and changes in the Unemployment Rate. 
There are other indicators of the success of the 
LEAST SQUARES fit which must be looked at 
beyond the R-BAR SQUARE. Among these are the 
T-STATISTICS of the coefficients, the STAN¬ 
DARD ERROR OF THE REGRESSION, and the 
DURBIN-WA TSON STA TISTIC. 


Fitted and Actual Values 

FITTED values are estimates of the LEFT 
HAND VARIABLE developed by using the historic 
explanatory variables in the regression equation. 
Thus, the FITTED values are the predictions which 
would have been achieved had the regression equa¬ 
tion been used in the past. 

In Example 1, the FITTED value for 1959 is 
4054 versus the ACTUAL value of 4036. Compar¬ 
ing the FITTED values with the ACTUAL values is 
often called “ex post” analysis (i.e. with the 
benefit of hindsight since the FITTED values are 
from the interval used to develop the regression 
equation in the first place). 


39 


May 1973 



Page 95 


Standard Error of tha Regression 

The STANDARD ERROR OF THE REGRES¬ 
SION gives a measure of how close the FITTED 
values have been to the ACTUAL values in the 
past. When regression analysis is being used to 
develop equations, it is desirable to have the 
STANDARD ERROR OF THE REGRESSION as 
small as possible. Also, this statistic may be used to 
gain some idea of the forecasting accuracy which 
can be expected. 

In Kxamplc 1. the STANDARD ERROR OF 
THE REGRESSION is 112. This means that: 

• in 67% of the historical observations, the esti¬ 
mate is within i 1 STANDARD ERROR of 
the ACTUAL < - 112 in Example 1). 

• in 95% of the historical observations, the esti¬ 
mate is within ♦ 2 STANDARD ERRORS of 
the ACTUAL ( ± 224 in Example 1) 

• in 99% of the historical observations, the esti¬ 
mate is within ± 3 STANDARD ERRORS of 
the AC TUAL ( ± 336 in Example 1). 

T-Statistic 

The T-STATISTIC shows the significance of 
each explanatory variable in predicting the depen¬ 
dent variable. It is desirable to have as large (either 
positive or negative) a T-STATISTIC as possible for 
each explanatory variable. Generally, any statistic 
greater than +2.0 or less than - 2.0 is acceptable. If 
the T-STATISTIC is between - 2.0 and +2.0 for 
any independent variable, that variable is not con¬ 
tributing significantly to explaining the dependent 
variable. Explanatory variables with low 7- 
STATISTICS can usually be eliminated from the 
regression without substantially decreasing the 
R BAR SQUARED or increasing the STANDARD 
ERROR OF HIE REGRESSION. (Basically the 
T-STATISTIC is a measure of how unlikely it is 
that there is no relationship between the depen¬ 
dent variable and an independent variable. Thus a 
high /-STATISTIC would make it very unlikely 
that the indicated relationship is just a tluke). 

In Example 1, the I'-STA I IS TICS on Disposable 
Income (56 I) and the Unemployment Rate ( -4.9) 
are both well beyond the threshold. This means 
that both variables contribute significantly to ex¬ 
plaining the Retail Company's Sales The constant 
( .008) is not significant and could be dropped 
from the equation. 

Standard Error (of the coefficients) 

The coefficients of the regression are estimated 


statistically. The ESTIMATED COEFFICIENTS 
are the best estimates of the coefficients for each 
explanatory variable. The STANDARD ERROR of 
the coefficient gives an estimate of the range in 
which that coefficient may “actually” be. Like the 
T-STATISTIC the STANDARD ERROR of the co¬ 
efficient indicates how significant each indepen¬ 
dent variable is in explaining the dependent vari¬ 
able In fact, the T-STATISTIC is developed by 
dividing the ESTIMATED COEFFICIENT for a 
variable by its STANDARD ERROR. Thus, the 
T-STATISTIC really measures how many STAN¬ 
DARD ERRORS the coefficient is away from 0. If 
the ESTIMATED COEFFICIENT of an indepen¬ 
dent variable were 0, no relationship between the 
dependent variable and this particular independent 
variable would be indicated 

In Example 1. the STANDARD ERROR of the 
constant (258 9) is considerably greater than the 
ESTIMATED COEFFICIENT' of the constant 
(-2.1). This means that the constant is not signifi¬ 
cantly different from 0 The STANDARD ERROR 
for Disposable Income (0.26) and for the Un¬ 
employment Rate (35.1) arc small relative to their 
ESTIMATED COEFFICIENTS (14 8 and 173 0 
respectively) indicating that both explanatory vari¬ 
ables have a substantially different from zero effect 
in explaining SALES, and that the association 
would not be due to chance. 

Durbin-Watson Statistic 

See the discussion on the pitfall of AUTO¬ 
CORRELATION. 

Correlation 

Any two variables may vary with each other. 
The extent to which they do so is called their 
CORRELA TION. If the positive change in one var¬ 
iable is directly and perfectly related to the posi¬ 
tive change in another variable, then the CORRE¬ 
IA TION between these two variables will be + 1.0; 
if one decreases as the other increases, then the 
CORRFL: 4 TION will be - 1.0 If the relationship is 
imperfect and positive, the correlation will be be¬ 
tween 0 and 1.0. If there is no relationship be¬ 
tween the change in one variable and the change in 
another, then the C ORREI A TION will be 0.0. 

CORRELATION between a set of variables is 
often displayed in a CORRELA TION MATRIX 
(see Example 2) which is read in much the same 
way as distances between cities on a roadmap. Bv 
locating the second number in the third row, you 
have found the correlation between the second and 
third variable. 


Business Economics 


40 


Page 96 


EXAMPLE 2 


CORRELATION MATRIX 



SALES 

YD 

C 

SALES 

1.0000 



YD 

0.9958 

1.0000 


C 

0.9970 

0.9997 

1.0000 

RU 

-0.4807 

-0.4124 

-0.4239 


This example shows the CORRI-I.A TIOX be¬ 
tween the Retail Company’s Sales (SAIL'S). Per¬ 


sonal Consumption Expenditures ((.'), Disposable 
Income (YD) and the Unemployment Rate (RU). 
Note that although the Unemployment Rate has a 
relatively low (ORRII.A TIOX with SAIL'S 
(-0.48). the regression results of Example 1 show 
that Unemployment does contribute significantly 
to Explaining S ALLIS when combined with Dispos¬ 
able Income. This is one reason why regression 
equations are more useful than simple CORRI-LA- 
HOWS for forecasting. 

REGRESSION PITFALLS: 

RECOGNIZING AND DEALING WITH THEM 

The two major pitfalls encountered in regression 
analysis are COl I.IXIARII T and Al TOCORR1-- 


EXAMPLE 3 


Note the unexpected negative 
estimated coefficient on 


ORDINARY LEAST SQUARES 

FREQUENCY ANNUAL 
INTERVAL 59: 1 TO 71: 1 
LEFT HAND VARIABLE: SALES 


Disposable Income (YD). 



RIGHT-HAND 

VARIABLE 

CONSTANT 

C 

YD 


STANDARD 
/ ERROR 

-1937.13 /' 355.049 

40.4550 X 15.3383 

-19.8445' 13.3504 


ESTIMATED 

COEFFICIENT 


T- 

STATISTIC 

-5.45595 

2.63752 

-1.48643 


R-BAR SQUARED: 0.9941 

DURBIN-WATSON STAT. (ADJ. FOR 0. GAPS): 0.9343 
STANDARD ERROR OF THE REGRESSION: 159.836 


DATE 

ACTUAL 

FITTED 

59 

4036. 

3959. 

60 

4134. 

4274. 

61 

4268. 

4390. 

62 

4603. 

4781. 

63 

5116. 

5204. 

64 

5740. 

5600. 

65 

6390. 

6182. 

66 

6805. 

6771. 

67 

7330. 

7128. 

68 

8198. 

8026. 

69 

8863. 

8916. 

70 

9262. 

9331. 

71 

10006 

10190 


PLOT: •= ACTUAL; + = FITTED 



41 


May 1973 




Page 9 7 


LATION The following explains what each of 
these pitfalls is, how to recognize it, and how to 
deal with it. 

Collinearity 

When using more than one independent variable 
in a regression equation, there is sometimes a high 
correlation between the independent variables 
themselves. COLLIXEARITY is when these ex¬ 
planatory variables interfere with each other It is a 
pitfall because the use of equations with COL¬ 
LIXEARITY may produce spurious forecasts. 

COLUXE A RITY can be recognized when the 
regression T-STATISTICS of two seemingly im¬ 
portant explanatory variables are low. Often COL¬ 
LIXEARITY even causes the ESTIMATED CO- 
EEFICIEXTS on explanatory variables to have the 
opposite sign from that which would logically be 
expected. Example 3 shows a case of COLI.IXEAR- 


ITY between Disposable Income and Consumption 
in explaining the sales of the Retail Company. It is 
not reasonable to expect Sales to decrease as Dis¬ 
posable Income increases as indicated by the nega¬ 
tive coefficient on Disposable Income. 

The COLLIXEARITY becomes apparent when 
the variables are compared in a correlation matrix. 
Looking back at the CORRELATION MATRIX of 
Example 2, shows that Disposable Income and 
Consumption are, in fact, highly correlated 
(0.9997). 

There are two ways to get around the problem 
of COLLIXEARITY. First, one of the highly corre¬ 
lated variables may be dropped from the regres¬ 
sion. In Example 4. consumption was dropped 
from the regression. Note that the R-SQUARED 
and STAXDARI) ERROR OE THE REGRESSION 
are not affected appreciably, but the T-STATISTIC 
on Disposable Income improves dramatically now 
that consumption no longer interferes with it. 


EXAMPLE 4 


ORDINARY LEAST SQUARES 

FREQUENCY ANNUAL 
INTERVAL 59: 1 TO 71: 1 
LEFT-HAND VARIABLE: SALES 


Note the low DURBIN-WATSON 
statistic indicating positive ‘ 

AUTOCORRELATION of the errors. 


RIGHT-HAND 

VARIABLE 

CONSTANT 

YD 


ESTIMATED 

COEFFICIENT 

-1123.73 

15.3558 


R-BAR SQUARED: 0.9909 

DURBIN-WATSON ST AT. (ADJ. FOR 0. GAPS): 

STANDARD ERROR OF THE REGRESSION: 


DATE 

ACTUAL 

FITTED 

59 

4036. 

4056. 

60 

4134. 

4251. 

61 

4268. 

4472. 

62 

4603. 

4792. 

63 

5116. 

5089. 

64 

5740. 

5604. 

65 

6390. 

6143. 

66 

6805. 

6736. 

67 

7330. 

7266. 

68 

8198. 

7952. 

69 

8863. 

8618. 

70 

9262. 

9464. 

71 

10006 

10307 




Business Economics 


42 



Page 98 


The second way around COLLINEAR1TY is to 
change the structure of the equation. One ap¬ 
proach is to divide both left and righj hand vari¬ 
ables by some series which will leave the basic 
economic logic but remove the collinearity. For 
the Retail Company, population might be a good 
candidate. Estimating Sales per capita by consump¬ 
tion per capita and Income per capita, though dif¬ 
ferent from the original formulation (the simple 
level of sales), may diminish the COLLINEARITY 
problem. Another structure which usually reduces 
the COLLINEARITY is to estimate the equation 
on a first difference basis; that is, estimate the 
change in Retail Company Sales on the change in 
Income and the change in Consumption. Another 
way out is to combine the collinear variables into a 
new variable which is their weighted sum. Often 
the weights can be derived from economic reason¬ 
ing. 

Autocorrelation 

A UTOCORRELA TION is the other major pitfall 
often encountered in regression analysis. One 
principal assumption of regression analysis is that 
the errors (between the FITTED and ACTUAL 
values) are independent from one observation to 
the next. That is, knowledge of the error in one 
year will not help you anticipate the error in the 
next year. AUTOCORRELATION is the case 
where there is a correlation between successive 
errors. It is a pitfall because the error between the 
FITTED and ACTUAL value in the last observa¬ 
tion is apt to persist into the forecast interval. 

The DURBIN-WATSON STATISTIC provides 
the standard test for AUTOCORRELATION. 
Generally, if the DURBIN-WATSON STATISTIC is 
between 1.5 and 2.5, there is not serious AUTO¬ 
CORRELATION in a regression equation If the 
DURBIN-WATSON is below 1.5, there may be 
positive AUTOCORRELATION between errors. In 
this case, a positive error in one observation will 
usually indicate a positive error in the subsequent 
observation. If the DURBIN-WATSON is above 
2.5, negative AUTOCORRELATION may be indi¬ 
cated, and the errors generally alternate negative 
and positive 

Example 4 illustrates serious positive AUTO¬ 
CORRELATION. Note that the DURBIN- 
WATSON STATISTIC is 0.8383 in this regression. 
The FITTED and ACTUAL values shown in 
expanded scale in Example 5 illustrate the persis¬ 
tence of the er-rors. If this, equation was used to 
forecast 1972 Sales, the resulting estimate would 
probably be high because of the positive AUTO¬ 
CORRELATION and the overestimate in 1971. 



EXAMPLE 5 


DATE 

ACTUAL 

FITTED 

ERROR 

59 

4036.00 

4056.03 

20.0288 

60 

4134.00 

4251.49 

117.493 

61 

4268.00 

4472.31 

204.310 

62 

4603.00 

4792.37 

189.372 

63 

5116.00 

5089.31 

-26.6915 

64 

5740.00 

5603.61 

-136.393 

65 

6390.00 

6143.27 

-246.728 

66 

6805.00 

6736.19 

-68.8075 

67 

7330.00 

7265.80 

-64.1996 

68 

8198.00 

7951.53 

-246.469 

69 

8863.00 

8617.81 

-245.194 

70 

9262.00 

9464.50 

202.497 

71 

10,000 

10006.0 

10306.8 

300.780 

+ 

/ 

L_ 



J 

t 



U 



Actual- 



os O «— esicoTtLntor^ooosO'— 

intD(O<DCD®tO<0<O(D(Or^(v 


43 


Mav 1973 





Page 99 


Usually AUTOCORRELATION indicates that 
there is an important part of the variation of the 
dependent variable which has not been explained. 
The best solution to this problem is to search for 
other explanatory variables to include in the re¬ 
gression equation. In the case of the Retail Com¬ 
pany’s Sales, including the Unemployment Rate 
with Disposable Income reduces the AUTOCOR¬ 
RELATION to an insignificant level (the DUR- 
BIN-WATSON in Example 1 is 1.9724). 

Sometimes no additional variables can be identi¬ 
fied to reduce AUTOCORRELATION to an insig¬ 
nificant level. In this case, forecasts from the equa¬ 
tion with AUTOCORRELATION should be modi¬ 
fied to reflect the error in the last observation. One 
technique for phasing out the effect of the last 
observed error is to calculate a factor (usually 
called rho) which is half of the difference between 
the DURBIN-WATSON and 2.0. In Example 4, rho 
= (2.0-0.8383)/2 = 0.581. This factor is multiplied 
times the last observed error and the result is added 
to the forecast from the equation for the first 
period. In Example 4, 0.581(10010—10310) = 
-174, which should be added to any forecast for 
1972 developed from the regression equations 
from Example 4. Use of the same factor (e.g. .581) 
times the amount added to the first forecast period 
(e.g. —174) provides the amount to be added to 
the equation result for the second forecast period 
(e.g. .581( 174) = -101). This process is con¬ 
tinued through the forecast interval until the modi¬ 
fication to the equation results is reduced to an 
insignificant level. 

SELECTING THE BEST EQUATION 

With the advent of timesharing, large data bases 
and English-language regression software, the prob¬ 
lem of developing regression equations has shifted 
from getting an equation estimated to selecting 
among several equations which have been esti¬ 
mated easily Choosing among alternative forecast¬ 
ing equations is somewhat like handicapping a 
horserace. First, the obvious losers must be elimi¬ 
nated, then the winner must be selected from 
among the remaining contenders. 

Some equations fail to meet standard acceptance 
criteria and can be eliminated easily. Selecting the 
best equation is more difficult, and involves a close 
analysis of each equation’s track record. The fol¬ 
lowing principles have proven useful in selecting 
appropriate forecasting equations. 

Eliminating the Losers 

1. Does the equation make sense? 


There must be some plausible causality be¬ 
tween the dependent and each of the indepen¬ 
dent variables. This criterion immediately 
eliminates sunspot-and-stock-market equa¬ 
tions. It also eliminates equations as in 
Example 3 where COLLINEARITY causes 
one explanatory variable (e.g. Disposable In¬ 
come) to take on the opposite sign from that 
which would be reasonably expected. Other 
tests for sensibility involve comparability of 
data on both the left and right hand side of 
the equation. If the dependent variable is in 
units or constant dollars, then the indepen¬ 
dent variables should not be in current dol¬ 
lars. If the dependent variable is seasonally 
adjusted, the independent variables should be 
also. Note that in example 1. the unemploy¬ 
ment rate is obviously in different units than 
sales of the Retail Company. The equation 
would make more sense if the dollars of in¬ 
come foregone due to unemployment were 
used instead of the rate of unemployment. 

2. I.ow TSTA TISTICS 

Equations which have explanatory variables 
with low T-STATISTICS should be re- 
estimated or dropped in favor of equations in 
which all independent variables are signifi¬ 
cant. This test will eliminate equations where 
COLLINEARITY remains a problem (see the 
Pitfall section on COLLINEARITY for sug¬ 
gestions of alternative equation structures 
which may sidestep the COLLINEARITY 
problem). 

3 Low R-RAR SQUARED 

The R-RAR SQUARED can be used to rank 
the remaining equations and to select the best 
three or four candidates. (Note: this approach 
usually produces identical selections as choos¬ 
ing among the remaining equations bv mini¬ 
mizing the STANDARD ERROR OF THE 
REGRESSION.) 

Choosing the Best Equation 

1. Turning points and recent accuracy 

Examining the (ex post) performance of 
FITTED and ACTUAL values provides the 
easiest basis for selecting one equation from 
those which have not been eliminated. The 
equation which best catches the turning 
points and whose recent accuracy is best, 
generally provides the best basis for forecast¬ 
ing. This criterion will usually be sufficient to 


Business Economics 


44 


Page 100 


select the equation which will be the winner. 

2. Ex Ante Tests 

Estimating an equation over pan of the his¬ 
toric data and using it to forecast the remain¬ 
der of the historic interval provides a means 
of checking forecasting accuracy Since the 
forecasts are made over an interval beyond 
that used to estimate the equation, the fore¬ 
casts are called ex ante. With this test, the 
equation whose estimated historic values are 
closest to the ACTUAL historic results is 
superior. This test is generally required only 
when an examination of turning points and 
recent accuracy fails to identify a clearly 
superior choice. 

3. Best DURBIN-WATSON STATISTIC 

Given equations which survive all previous 
tests of comparability, the equation with the 
DURBIN-WATSON STATISTIC closest to 2.0 


can be a basis for selection. Usually the previ¬ 
ous criteria will have been sufficient to make 
a choice among equations. This test is last be¬ 
cause it is better to have an equation with a 
smaller error (i.e. higher R-BAR SQUARED 
or lower STANDARD ERROR OF THE RE¬ 
GRESSION) which persists (i.e. poorer 
DU R B IN-WATSON), than a larger error 
which is random. If all else fails, autocorre- 
lated errors can be phased out mechanically 
(see the end of the discussion of the Pitfall of 
A UTOCORRELA TION). 


'A book that is often used to explicate these concepts in 
a more rigorous fashion is J. Johnston's Econometric Me¬ 
thods published in a 2nd Edition in 1972 by McGraw-Hill 
Book Company. Klein, L. R., An Introduction to Econo¬ 
metrics, Prentice Hall, 1962 is a good reference for elemen¬ 
tary study. Wonnacott, R. J. and Wonnacott, R. H., Econo¬ 
metrics, Wiley, 1970 is often recommended for intermedi¬ 
ate study. Finally, Theil, H., Principles of Econometrics, 
Wiley, 1971 is a standard advanced text. 


45 


May 1973 



Page 101 


F STATISTIC 

The F statistic tests the null hypothesis that the 
coefficients of the independent variables are equal to 
zero. If the hypothesis is supported this would indicate 
that there is no important relationship between the 
independent variables and the dependent variables. If the 
F statistic does not support the hypothesis, then this 
would indicate that there is an important and predictive 
relationship between one or more of the independent 
variables and the dependent variable. The t-statistic can 
then be used to select out which individual independent 
variables are most highly related to the dependent 
variable. 

A low F value is considered to be supportive of the 
null hypothesis and a high F value is considered to be 
counter-indicative of the null hypothesis. 

The F value is supplied with its degrees of freedom 
and an F table in a statistics book should be consulted to 
find the appropriate F level by which to judge whether or 
not your F value is high or low ( i.e., significnt or 
non-significant). 



Page 102 


The F statistic is only supplied by Tiny Troll when 
more than two variables are used in the REGRESSION 
command. 

VARIANCE-COVARIANCE MATRIX 

The variance-covariance matrix describes the 
significance and interrelationships of the coefficients. 
Multiple linear regression assumes that the coefficients 
are distributed joint normally with the relevant 
parameters equal to the variance-covariance matrix. An 
example of interpreting the matrix is given below: 


CFT 

0 

1 

2 

0 

.45 

-.21 

.49 

1 

-.21 

.62 

.01 

2 

.49 

' .01 

• 33 


Note that this matrix is symmetric and 
positive-definite. The terms along the diagonal are the 
variance of our estimates. Thus, for coefficient 1, the 
standard error of our estimate is the square root of .62. 
The covariance of 1 with 2 is .01 which means that there 
is essentially no relationship between the two 
coefficients. If there is a strong relationship between 
all the coefficient, this is sometimes referred to as 




Page 103 


multi-collinearity which means that our estimates of the 


coefficients are rather diffuse. 


CU€Cf^ V/D/Y tOfis (C(c(7^° /U ' v r/K-Atl&G 


* ! I 


III' 

! j ii 
■ ] i ’ 

i *. 

; h 

: ! j 

j'l’i 

.{ I: • 

if-,: 

!!■:?; 


■; r 


■i i:.• 

HI}) 

iN 

I 


-J f 


i i 




j ! 


ii 


• i , 
> , i 
I ! 
i t 
I : 


It. 


830 


AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1007 


jz/vo ePO'VOZ/^T' 


■ W. j 

,K ^ 


J3f&> 


4 

n^^ ss °R ^^1 ADLES^^ElijlN PROBLp 


TABLE 1. APPJ ; 


-£rn~P 

X, 
GXP 
Implicit 
Prico 
Deflator 
1051 = 100 

Xt 

Gross 

National 

Product 

IS 

X, 

Unemployment 

e> 

Xt 

Size of 
Armed 
Forces 

*H 

X, 

Noninstitutional 
Population 14 
Years of Age 
and Over 
vS' 

X, 

Time 

b 

Add-Cheek 

Column 

83.0 

234,289. 

2,356 , 

1,590, 

1 107,608, v 

1917 

347,873.0 

8S.5 

259,426 

' 2,325 ' 

1,450' 

\ 108,632 ' 

1948 

373,875.5 

S8.2 

258,054 

3,0S2 

1,610 

f 109,773 

1949 

375,162.2 

80.5 

284,599 

3,351 

1,050 

/ 110,929 

1950 

402,568.5 

—>9G.2- 

328,975 

2,099 

3,099 

J 112,075 

ClosO 

4 IS,295.2 

93.1 

346,999 

1,932 

3,594 

[' 113,270 

1952 

467,S45.1 

00.0 

365,385 

1,870 

3,547 

115,094 

1953 

487,918.0 

100.0 

363,112 

3,578 

3,350 

110,219 

1954 

488,313.0 

101.2 

397,409 

" 2,904 ‘ 

3,048 

117,3S8 

1955 

522,865.2 

104.0 

419,ISO 

2,822 

2,S57 

118,731 

1950 

515,653.6 

10S.4 

442,709 

2,936 

2,798 

120,4 15 

1957 

571,013.4 

110.S 

444,510 

4,031 

2,037 

121,050 

1958 

575.SS2.S 

112.6 

482,701 

3, S13 

2,552 

123,306 

1959 

614,506.0 

. 114.2 

502,601 

3,931 

2,514 

125,303 

19G0 

G36.4SS.2 

115.7 

518,173 

4,S00 

2,572 

127,S52 

1961 

G55.479.7 

-*• 110.9 

554,894 

' 4,007 

2,827 

130,081 

1962 

G93,8S7.9 

21,026.9 

6,203,175 

51,093 

41,707 

1,878,784 

31272 

S,207,G57.9 


SOURCE OF DATA: For a complete listing of Oie Implicit Prico Deflators for Gross Notion .'ll Product see 
Council of Economic Advisers, ECONOMIC REPORT 017 THE PRESIDENT, January IP01, Tablo C-0. p. 21-t. 
For Gross National Product (in current prices) sco U. S. Department of Commerce, OITico of Business Economies, 
“Survey of Current Business,* July 1003, p. 12. For Unemployment, Sire of Armed Forces, and Noninstitutional 
Population ... sco "Employment and Earnings* dated September 1903, Vol. 10, No. 3, Tablo A-l, p. 1. 


LEAST SC 


A V 

Year 


1947 

1018 

1919 

1000 

1951 

1032 

1953 

1954 

1955 
1958 
1957 
190$ 
1909 

1900 

1901 
1962 


SOUP 
of Labor, 1 
and Train 1 
Labor. *E' 
see Tablo 










































SEPTEMBER 1907 


LEAST SQUARES PROGRAMS FOR THE ELECTRONIC COMPUTER 


831 


EM 

.y-< / • > . 

t 

. 

. ’ i ■ 

f 

V, 

Add-Chcck 

imo 

Column 

ill 7 

317.S73.0 

'.) IS 

3/3,S/o.o 

019 

375,162.2 

;»50 

402.5GS.5 

'.'51 

4 IS,295.2 

■15'2 

4G7,Si 5 s 1 

■.'53 

4S7.9IS.0 

154 

4SS,313.0 

)o5 

,522,805.2 

.*50 

545,053.0 

J57 

571,013.4 

i.5S 

575.SS2.S 

559 

614,500.6 

9C0 

636,4SS.2 

9C1 

055,479.7 

%2 

693.SS7.9 

272 

8,207,657.9 

k 


] National Product sea 
'.00t, Table C-6. p. 214. 
•f Iliuicrss Economica, 
•, and Noiuoautulional 
leA-l.p. 1. 




3=-^ 


REGR1 


Year 


1017 
10 IS 
1019 

1050 

1051 
10321 
1033 
1054' 


1055 

1950 

1957 

1053 

1050 

1000 

1961 

1002 


r&er 

Total 

Derived 

Employ- 


60,323 
01,122 
CO,171 
61.1S7 
03,221 
03,030 
61,039 
_63.701 — 
C6,010 
67,857 
08,169 
60,513 
6S,G35 
69,561 
60,331 
70,551 ^ 


C ACr 

Census 

Agricultural 

Employ- 


8 

7,000 

8,017 

7,107 

7,018 

6,702 

6,555 

_6,495 - 

6,718 
6,572 
6,222 
6,811 
' 5,830 . 
5,723 ' 
6,163 
6,100 


TABLE 2. APPENDIX 
SSANDS USED IN TEST PROBLEN 


C SG 

Census 

Self- 

Employed* 


4 - 


6,015 
6,130 
0,208 
6,000 
5,800 
5,070 
5,701 

6,550_ 

5,8S6 
5,930 
,-0,0$).. 
^-6JS5j 
6,298 
6,307 
6,388 
6,271 


cur^C 

Census 

Unpaid 

Family 

Worker# 

-S&- 

.4 27 
101 
300 
101 
100 
431 
423 

_.415 - 

521 

581 

620 

605 

507 

615 

662 

623 


CO 

Census 

Domestics 


1,711 
1,731 
1,772 
1,005 
2,055 
1,022 
1,085 
.1,910 
2,210 
2,359 
2,328 
2,150 
2,520 
2,189 
2,594 
2,626 


BI.S Nun- 
agricultural 
Private 
Number of 


m 




38,107 
39,211 
37,922 
39,190 
11,ICO 
42,210 
43.5S7 
42,271 
43,701 
45,131 
45,278 
43,530 
45,214 
45,850 
45,397 
46,052 


13LS 

Federal 

Government 


1,892 

1,803 

1,008 

1,023 

2,302 

2,120 

2,305 

2,188 

2.1R7 

2,200 

2,217 

2,101 

2,233 

2,270 

2,270 

2,340 


S L£r 

BLS Stnto 
and Local 
Government 


3,532 

3,767 

3.0 IS 

4,008 

4,087 

4,ISS " 

4,310 

4,50.1 

4,727 

6,000 

6,400 

5,702 

5,057 

0,250 

6,518 


SOURCE OF DATA: For self-employed., unpaid family workers, and domestics seo United States Department 
of Labor "Manpower Report of the President and a Report on Manpower Requirements, Resources, Utilisation 
and Training * March 1003, Table A-4, p. 141. For Agricultural employment see United Slates Department of 
Labor, •Employment and Earnings," Vol. 10, No. 3, September 1003. For Bureau of Labor Statistics employment 
MO Tablo B-l, p. 13, of tho same Issue of ‘Employment and Earning*/ 

i 


Co^Ly" 

/b 

' 

l. 

i! 

i. 


i 

l 

I. 

■ 


I • . 


l 


- ■r 


v r 
















































c 4 ’ ~~ ' . ' 



00 

CO 

to 


TABLE 3. APPENDIX 

EQUATIONS COMPUTED ON A DESK CALCULATOR 1 



Xi 

X, 

e 

X, 

X. 

X. - 

Constant 


s 

7 


5 


1 - 

-7- 

Total 

+ 15.0C1S7227 

-0.035S1917 

—2.02022'jSO 

—1.03322G80 

-0.05110119 

+1S29.15146461 

—3482253.6330 

Agriculture 

—59.34820937 

-0.000231S9 

-0.09130328 

—0.31651G91 

-0.09950747 

+ 123.69160185 

— 21G1S3.1U0 

Self Employed 

+27.94670440 

-0.0001S918 

+0.0-13S7S61 

-0.24220910 

+0.01335594 

- 84.24950155 

+ 1G34S2.24*10 

Unpaid Family Workers 

+ 5.S577S104 

-0.00381173 

-0.07-145272 

-0.07058316 

+0.00309300 

+ 96.00993412 

- ISO1S9.1600 

Domestics 

-27.79203555 

+0.01S02730 

+0.19094872 

—0.0G330130 

—0.14356789 

- Cl.04619384 

+ 131S76.7460 

Non-A?. Private Employment 

+39.42427550 

—0.00599025 

—2.31GGG7SG 

-0.31213923 

+0.0191SGG7 

+2090.2SS57349 

-4030300.3830 

Federal Government 

+ 3.39252205 

+0.01070351 

+0.12617553 

+0.20303100 

—0.025S6SS3 

- 190.76061018 

+ 3S4319.1C70 

Total State it Local Gov’t. 

+25.5S1S24U 

+0.0CG00907 

+0.09789619 

-0.20117874 

+0.1219035S 

- 114.16233725 

+ 267714.8660 

State d: Loc.l Education 

+ 1S.0IIG23S2 

+0.00245790 

+0.C5530S12 

—0.1114G991 

+0.00571039 

- 10.12361093 

+ 19167.0S20 

Other “State Local Gov’t. 

+ 7.5402002S 

+0.00355116 

+0.0125SS07 

—0.0S970S79 

+0.11619313 

- 134.05S72C61 

+ 218547.1870 

Added Total 

+ 15.06137227 

-0.035S1917 

—2.02022081 

—1.03322G84 

-0.05110110 

+ 1829.15146463 

—34S2258.6340 


1 Estimates nra not rounded. See Table 1, Appendix for identification of X — variables. 


i 



3T‘- - • -/XT 







V 


fir 1 

?- •■ -1 

1 1+4- +1 + + 1 1 I + 

oooooo - o o o o 7- 

; ; 7 J 7 7 7 ' 7 7 

1 t 

M V* 

<7 ? 


y- 

td pj 

o 


b^ b< b 1 b 1 b 1 b 1 b* b b* b b 1 

i 1+++1 ++ii t + 1 

oooooo — O O O 2 •>* 

> 

t/J 

+ 


AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1907 



































EPTEMBEK IM7 


LEAST SQUAWKS PWOGKAMS I'DU TUB ELECTiiONIC COMPUTE!! 


S33 


TABLE 4. APPENDIX 

I.WKHSK C.OMPUTKD OX A DESK CALCULATOR' 


•f' a ’ V * % 

...-------- . 

....... _.____ _ 


A’i 

A'i 

A*. 

If* 

A'i + 1.241 378 001 792 1S7 131 

-0.000 317 000 330 5S7 205 

-0.003 902 073 510 800 177 

}* ' 

Xi-O.COO 317 9C0 330 5S7 207 

+ 0.000 000 103 101 500 700 

+ 0.000 002 CO2 872 718 053 

1 •* 

A'i — 0.003 9G2 073 510 500 173 

+ 0.000 002 GOO 872 748 053 

+0.000 011 Odd 408 402 878 

. M * ’• • 

A*.-0.001 092 G59 072 930 030 

+ 0.000 000 57S 019 700 602 

+0.000 Oil 14 1 Cf.2 540 240 

V 

A'i+0.002 178 572 7G0 23S 747 

-0.000 001 080 0S0 303 055 

-0.000 014 413 732 922 GG9 

r , / 

A'i+1.140 JOS 392 715 150 309 

-0.002 105 395 015 790 737 

-0.031 501 G59 303 212 022 

i « 

> A'i 

X, 

X, 


Xi-0.001 092 G50 072 030 037 

+0.002 178 572 700 233 743 

+- 1.240 408 302 745 150 390 


A'i+0.000 000 57S 049 700 002 

-0.000 001 0S0 0S9 303 055 

-0.002 105 395 045 709 771 


A'i+0.000 Oil 111 CG2 540 240 

-0.000 014 413 732 922 GG9 

-0.031 501 659 303 212 010 


Xi+0.000 007 001 521 641 005 

-0.000 001 575 500 209 700 

-0.009 230 737 794 491 538 


Xi-0.000 001 575 500 299 700 

+0.000 00S 799 010 6S2 090 

+0.000 881 190 945 701 703 


X|—0.009 230 737 794 491 500 

+0.000 881 100 045 701 720 

+•35.716 733 905 618 601 003 

'•'« 1 


Check Column 




A'i -f 2.478 591 C77 311 192 853 



/ A".-0.002 421 000 M l 674 SS9 


A'i-0.035 4S3 872 030 303 CS2 
A'.-0.010 305 311 230 773 330 
A'.+0.009 051 487 391 097 128 
A'i + 30.921 125 787 1G5 912 031 




l See Table 1, Appendix for identification of X-variablea. 


TABLE 5. APPENDIX 

REGRESSION EQUATION FOR TOTAL EMPLOYMENT DERIVED FROM 
IBM 1401 MATRIX INVERSION PROGRAM DEVELOPED FOR INTERNAL 
USE IN THE BUREAU OF LABOR STATISTICS 


Regression Equation 


n 

+15.061872271373 

Xi 

- —1.03322GSG7173 

XI 

-0.035819179292 

XI 

—0.051104105653 

XI 

-2.020229803810 

Z4 

+1829.151401013551 

Constant 1 


Inters* Matrix 



XI 

+ 1.24137S001792IS713.0 
—0.0003170603305S7204 
-0.003U62073510SGG173 
-0.001092050072030033 
+ 0.002178572760238717, 

+ 1.240-108302745150398" 

XI 

- 0.000317000:13055720-1 
+0.000000193101500000 
+ 0.0000020028727-1S052 
+ 0.0000005789497C0C0j 

- 0.000001CS005930305J 
—0.002105395015700737 

XI 

- 0.0O39C2073.540SC0173 
+0.00000200287274S053 
+0.0000 llOOfl f0£402S7£ 
+0.00001111IGG251024$ 

- 0.00001441373292200$ 

-0.0315G105030321202j> 


XI 

—0.001092050072930033 
+0.00000057891070000+ 
+0.000011141602546245, 

+0.000007001521641001 
-0.000001575500290090 
-0.009230737701491500 j 

X| 

+0.002178572700238742 

- 0.0000010ti-;0S'J30305i 

- 0.00001111373292260S 
-0.000001575500299092 
+0. OOOOOS7UOO10082090 
-+0.0OC8S1100915701729 

XI 

+1.240-10839274515039$, 
—0.002105395015700737 
-0.031501C5930321202 Z 
-0.009230737794191500 
+0.00G881190915701729 
+35.710733995018561001 



* Constant was not computed. For Identification of AT-varmblcs, sco Table 1, Appendix. The routine followed 
the Gaussian elimination process. The i/uitix, carried to forty dibits, was symmetrical to 28 decimals. The digits are 
truncated. 































